5.4.3 Cluster and robust estimation
The options robust
and cluster()
are used separately to specify whether one wants resp. robust or cluster estimation, and will as a result present regression estimates with adjusted standard deviations for the estimated coefficients. Associated t-, z- and p-values are also affected. Other values are not affected compared to standard estimation.
Note that robust
and cluster
can not be used in combination (cluster
implies robust estimation).
Robust estimation can be used where there is a suspicion of problematic outliers or heteroskedasticity.
Cluster estimation is used when it is suspected that there are
systematic dependencies within groups of observations, e.g. within
schools or municipalities. The groups are specified through a variable
(cluster variable) which is included in the parentheses of the cluster
option, e.g. cluster(school)
or cluster(municipality)
. The following conditions apply, otherwise the system will give an error message:
-
The number of groups must be of a certain size
-
The cluster variable must be numeric
-
The cluster variable cannot be included as a variable in the regression expression.
Examples:
regress income man married high_education, robust
regress income man married high_education, cluster(municipality)
Robust
and cluster
options can also be used on other regression types.